{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-14T01:59:38.597380Z",
     "start_time": "2025-01-14T01:59:38.352134Z"
    }
   },
   "outputs": [],
   "source": [
    "from rag import *"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-14T02:12:17.403776Z",
     "start_time": "2025-01-14T02:12:17.074465Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "30\n"
     ]
    }
   ],
   "source": [
    "result = search_papers(query='Learning Retrieval Augmentation for Personalized Dialogue Generation', top_k=30)\n",
    "# result = search_papers(query='section name: Mitigating Spurious Correlations. topic: Aspect Based Sentiment Analysis.', top_k=100)\n",
    "print(len(result))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-14T02:12:20.899559Z",
     "start_time": "2025-01-14T02:12:20.893956Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'id': 454845728459088896,\n",
       " 'distance': 0.8027275204658508,\n",
       " 'entity': {'paper_id': '6576dca3939a5f40821b0e5d',\n",
       "  'paper_title': 'Learning Retrieval Augmentation for Personalized Dialogue Generation.',\n",
       "  'chunk_id': 0,\n",
       "  'chunk_text': '# Learning Retrieval Augmentation for Personalized Dialogue Generation\\nQiushi Huang , Shuai $\\\\mathbf{F}\\\\mathbf{u}^{2}$ , Xubo Liu 1 ,Wenwu Wang 1 ,Tom $\\\\mathbf{Ko}^{3}$ , Yu Zhang 2 ∗, Lilian Tang 1 University of Surrey, Southern University of Science and Technology, ByteDance AI Lab {qiushi.huang, xubo.liu, w.wang, h.tang} $@$ surrey.ac.uk, {fus.akhasi, tomkocse, yu.zhang.ust} $@$ gmail.com\\n\\n# Abstract\\nPersonalized dialogue generation, focusing on generating highly tailored responses by leveraging persona profiles and dialogue context, has gained significant attention in conversational AI applications. However, persona profiles, a prevalent setting in current personalized dialogue datasets, typically composed of merely four to five sentences, may not offer comprehensive descriptions of the persona about the agent, posing a challenge to generate truly personalized dialogues. To handle this problem, we propose Learning Retrieval A ugmentation for Personalized Dial Ogue Generation ( LAPDOG ), which studies the potential of leveraging external knowledge for persona dialogue generation. Specifically, the proposed LAPDOG model consists of a story retriever and a dialogue generator. The story retriever uses a given persona profile as queries to retrieve relevant information from the story document, which serves as a supplementary context to augment the persona profile. The dialogue generator utilizes both the dialogue history and the augmented persona profile to generate personalized responses. For optimization, we adopt a joint training framework that collaboratively learns the story retriever and dialogue generator, where the story retriever is optimized towards desired ultimate metrics (e.g., BLEU) to retrieve content for the dialogue generator to generate personalized responses. Experiments conducted on the CONVAI2 dataset with ROCStory as a supplementary data source show that the proposed LAPDOG method substantially outperforms the baselines, indicating the effectiveness of the proposed method. The LAPDOG model code is publicly available for further exploration. 1',\n",
       "  'original_filename': 'Conf_Paper_Meta_Data_EMNLP_2023_with_whole_text.db',\n",
       "  'year': 2023}}"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-09T02:49:30.528199Z",
     "start_time": "2025-01-09T02:49:30.518887Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0th: # 1 Introduction-distance: 0.682511568069458\n",
      "1th: # 1 Introduction-distance: 0.6775100827217102\n",
      "2th: # 1 Introduction-distance: 0.673060417175293\n",
      "3th: # 2 Related Work-distance: 0.6700544357299805\n",
      "4th: # Aspect-Based Sentiment Analysis with Explicit Sentiment Augmentations-distance: 0.6698358654975891\n",
      "5th: # 2 Related Work-distance: 0.6673760414123535\n",
      "6th: # Aspect-oriented Opinion Alignment Network for Aspect-Based Sentiment Classification-distance: 0.6650732755661011\n",
      "7th: # 1 Introduction-distance: 0.6583970189094543\n",
      "8th: # Related Work-distance: 0.6519566774368286\n",
      "9th: # Introduction-distance: 0.6503339409828186\n",
      "10th: # Related Work-distance: 0.6464039087295532\n",
      "11th: # Cross-Domain Data Augmentation with Domain-Adaptive Language Modeling for Aspect-Based Sentiment Analysis-distance: 0.6411128640174866\n",
      "12th: # Aspect-to-opinion Direction-distance: 0.6407355666160583\n",
      "13th: # Listing 2: Few-shot Prompt (10 shots) for ASQP (R15).-distance: 0.6383765339851379\n",
      "14th: # 4.4 Case Study & Error analysis-distance: 0.6368587017059326\n",
      "15th: # 6.2 Aspect Evaluation-distance: 0.6361230611801147\n",
      "16th: # 2 Related Work-distance: 0.6313188672065735\n",
      "17th: # 2 Related Work-distance: 0.6312605142593384\n",
      "18th: # 1 I NTRODUCTION-distance: 0.6310967803001404\n",
      "19th: # Improving Aspect Sentiment Quad Prediction via Template-Order Data Augmentation-distance: 0.6303401589393616\n",
      "20th: # Towards Unifying the Label Space for Aspect- and Sentence-based Sentiment Analysis-distance: 0.6290284991264343\n",
      "21th: # Span-level Bidirectional Cross-attention Framework for Aspect Sentiment Triplet Extraction-distance: 0.6270270347595215\n",
      "22th: # Counterfactual-Enhanced Information Bottleneck for Aspect-Based Sentiment Analysis-distance: 0.6264548897743225\n",
      "23th: # MVP: Multi-view Prompting Improves Aspect Sentiment Tuple Prediction-distance: 0.6259329915046692\n",
      "24th: # 2 RELATED WORKS-distance: 0.6256837844848633\n",
      "25th: # Experimental Setup-distance: 0.6239959001541138\n",
      "26th: # Knowledge Graph Augmented Network Towards Multiview Representation Learning for Aspect-based Sentiment Analysis-distance: 0.6223631501197815\n",
      "27th: # BiSyn-GAT $^{\\mp}$ : Bi-Syntax Aware Graph Attention Network for Aspect-based Sentiment Analysis-distance: 0.6202107071876526\n",
      "28th: # 7 Conclusion-distance: 0.6199389696121216\n",
      "29th: # Sentiment Polarity Constraints-distance: 0.6140410304069519\n",
      "30th: # 1 Background and Related Work-distance: 0.6137765645980835\n",
      "31th: # CCorrelation Between Story Quality and Aspect Rating-distance: 0.6108079552650452\n",
      "32th: # Training Entire-Space Models for Target-oriented Opinion Words Extraction-distance: 0.6056584715843201\n",
      "33th: # 3.3 Fine-tuning-distance: 0.6055320501327515\n",
      "34th: # 5 Experimental Evaluation-distance: 0.6044978499412537\n",
      "35th: # 2 Related Work-distance: 0.6032508611679077\n",
      "36th: # F.4 Results of Aspect Category Classification-distance: 0.602325439453125\n",
      "37th: # AX-MABSA: A Framework for Extremely Weakly Supervised Multi-label Aspect Based Sentiment Analysis-distance: 0.6012232303619385\n",
      "38th: # 1.9 Sentiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133-distance: 0.6011866927146912\n",
      "39th: # CA DDITIONAL RESULTS-distance: 0.5996102690696716\n",
      "40th: # 7.3 More Analysis-distance: 0.5983064770698547\n",
      "41th: # D.8 Details on OABS datasets-distance: 0.5965851545333862\n",
      "42th: # 3.3 Inter -Context Module-distance: 0.5961902141571045\n",
      "43th: # 2 Preliminaries on Generative ASQP-distance: 0.595742404460907\n",
      "44th: # Target-to-Source Augmentation for Aspect Sentiment Triplet Extraction-distance: 0.595664381980896\n",
      "45th: # Aspect: Direction-distance: 0.5914393067359924\n",
      "46th: # 3 Task Formulation-distance: 0.5901967883110046\n",
      "47th: # M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal Aspect-based Sentiment Analysis-distance: 0.5901787877082825\n",
      "48th: # 5.4 Performance Analysis-distance: 0.5900112390518188\n",
      "49th: # 3.3 Pre-training Tasks-distance: 0.5893905162811279\n",
      "50th: # OPEN A SP : A Benchmark for Multi-document Open Aspect-based Summarization-distance: 0.5890423655509949\n",
      "51th: # 2 Related Work-distance: 0.5889747142791748\n",
      "52th: # Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis-distance: 0.5883775949478149\n",
      "53th: # 2 Methodology-distance: 0.5867764949798584\n",
      "54th: # 5.2 Aspect Summarization-distance: 0.586685061454773\n",
      "55th: # 4.2 Baselines-distance: 0.5865994691848755\n",
      "56th: # 4.3.2 Threshold Hyperparameter in Inference-distance: 0.5861906409263611\n",
      "57th: # 4.2 Impact on Decision-Making-distance: 0.5855630040168762\n",
      "58th: # Acknowledgments-distance: 0.5845654010772705\n",
      "59th: # 4.5 Ablation Study-distance: 0.5844199061393738\n",
      "60th: # 1 Introduction-distance: 0.5821180939674377\n",
      "61th: # 4 EXPERIMENTS-distance: 0.5814165472984314\n",
      "62th: # 2 Related Works-distance: 0.5811510682106018\n",
      "63th: # 7 Conclusion-distance: 0.5808697938919067\n",
      "64th: # Abstract-distance: 0.580053448677063\n",
      "65th: # A Novel Energy based Model Method for Multi-modal Aspect-Based Sentiment Analysis-distance: 0.5800360441207886\n",
      "66th: # 6. Ablation studies-distance: 0.5800317525863647\n",
      "67th: # 3 Our Framework and Related Work-distance: 0.5797387957572937\n",
      "68th: # 5.2 Estimated Attribute Space-distance: 0.5795393586158752\n",
      "69th: # Acknowledgements-distance: 0.5793282985687256\n",
      "70th: # 3 Proposed Methodology-distance: 0.5778992772102356\n",
      "71th: # 4.2 Experimental Settings-distance: 0.5775094032287598\n",
      "72th: # DDetailed Results-distance: 0.5761134028434753\n",
      "73th: # 3.1 Sentiment Word Position Detection-distance: 0.5759567618370056\n",
      "74th: # A Datasets-distance: 0.5758122801780701\n",
      "75th: # 5.3 Ablation Studies-distance: 0.5749250054359436\n",
      "76th: # Introduction-distance: 0.5732985734939575\n",
      "77th: # 2 Related Work-distance: 0.5732028484344482\n",
      "78th: # 2 Related Works-distance: 0.5730876326560974\n",
      "79th: # 5.4 Model Prediction on Special Cases-distance: 0.572967529296875\n",
      "80th: # 6 Analyzing Marked Words: Pernicious Positive Portrayals-distance: 0.5727958083152771\n",
      "81th: # 4.3 Ablation Study-distance: 0.571487545967102\n",
      "82th: # 4.4 Ablation study and analysis-distance: 0.5711238384246826\n",
      "83th: # 5.2. Ablation Studies-distance: 0.5708805322647095\n",
      "84th: # 4.4. Ablation Study-distance: 0.5707154870033264\n",
      "85th: # Tagging-Assisted Generation Model with Encoder and Decoder Supervision for Aspect Sentiment Triplet Extraction-distance: 0.5699883103370667\n",
      "86th: # 4.4. Ablation Study-distance: 0.5698820948600769\n",
      "87th: # 5.2 Dataset Collection-distance: 0.5694810748100281\n",
      "88th: # Additional Experiments-distance: 0.5689876079559326\n",
      "89th: # 5.2 Influence of Aspect Representation-distance: 0.5689645409584045\n",
      "90th: # 4.1. User Study-distance: 0.5687441825866699\n",
      "91th: # 4. Experimental results-distance: 0.5677317380905151\n",
      "92th: # Ablation Study-distance: 0.5676039457321167\n",
      "93th: # 4.4 Ablation Study (RQ3)-distance: 0.5674641132354736\n",
      "94th: # A.4.3 A BLATION ON CORPUS SOURCES OF NEGATIVE LABELS .-distance: 0.5674524307250977\n",
      "95th: # A Self-enhancement Multitask Framework for Unsupervised Aspect Category Detection-distance: 0.5673087239265442\n",
      "96th: # 4.3 Ablation Study-distance: 0.566775918006897\n",
      "97th: # 5.3. Quantitative Results-distance: 0.5662375688552856\n",
      "98th: # 5.3. Effects of Visual Features-distance: 0.5660723447799683\n",
      "99th: # 4.3 Overall-to-Part Ablation Study-distance: 0.5659483075141907\n",
      "100th: # D. Additional Evaluation-distance: 0.5658649206161499\n",
      "101th: # 2RELATED WORK-distance: 0.565825343132019\n",
      "102th: # A.2. User survey as ablation-distance: 0.565507709980011\n",
      "103th: # there is higher document-diversity overall.-distance: 0.5654941201210022\n",
      "104th: # 5 Conclusion-distance: 0.5654222369194031\n",
      "105th: # 4RESULTS AND ANALYSIS-distance: 0.565366804599762\n",
      "106th: # 5.3 Compared Methods-distance: 0.5652830004692078\n",
      "107th: # 3.3.2 Calculate the relationship-populations-distance: 0.5651561617851257\n",
      "108th: # 3.5. Discussion-distance: 0.565061092376709\n",
      "109th: # 5.2. User Study-distance: 0.5649081468582153\n",
      "110th: # 4.3 Analysis of Human-Centered Biases-distance: 0.56459641456604\n",
      "111th: # 5.2 Aesthetic Critiques-distance: 0.5645478963851929\n",
      "112th: # 4.3. Analyses-distance: 0.5642884373664856\n",
      "113th: # 3.2 Preference Estimation-distance: 0.5639455318450928\n",
      "114th: # 3 EXPERIMENTS-distance: 0.5639201402664185\n",
      "115th: # 6.6 Feature Visualization-distance: 0.5633230805397034\n",
      "116th: # 4.3. Main Results on PartImageNet-distance: 0.5631535053253174\n",
      "117th: # 5. Results and Analysis-distance: 0.5628048777580261\n",
      "118th: # 2 AB BA : A New Annotated Corpus of Bias in Argumentative Text-distance: 0.5626876950263977\n",
      "119th: # 4.2.3 Adaptive Adversarial Perturbation (AAP)-distance: 0.5626277923583984\n",
      "120th: # 2. Related Work-distance: 0.5625928640365601\n",
      "121th: # 4.2 Experimental Settings-distance: 0.5624338984489441\n",
      "122th: # 4.4 Ablation Study & Analysis-distance: 0.5624039173126221\n",
      "123th: # 5.1. Results-distance: 0.5623059272766113\n",
      "124th: # 4.2. Animation Evaluation-distance: 0.5619425773620605\n",
      "125th: # .3 More Ablation Studies-distance: 0.5617083311080933\n",
      "126th: # 4.2. Object pose Evaluation-distance: 0.5615038275718689\n",
      "127th: # 7.6. User Study-distance: 0.561410665512085\n",
      "128th: # 6. Ablation Study and Dicussion-distance: 0.5605588555335999\n",
      "129th: # D.8. Attribute-Affordance Causal Relations-distance: 0.5604751110076904\n",
      "130th: # 7 CONCLUSION AND DISCUSSION-distance: 0.5603503584861755\n",
      "131th: # Understanding Aesthetics with Language: A Photo Critique Dataset for Aesthetic Assessment-distance: 0.5599961876869202\n",
      "132th: # 2. Related Work-distance: 0.5596977472305298\n",
      "133th: # 5 Experiments and Results-distance: 0.5596564412117004\n",
      "134th: # FINE MATCH : Aspect-based Fine-grained Image and Text Mismatch Detection and Correction-distance: 0.5596265196800232\n",
      "135th: # 4 StoryER-distance: 0.5594668388366699\n",
      "136th: # 3.1 Experiment Setup-distance: 0.5594211220741272\n",
      "137th: # DDATA CLEANING AND POSTPROCESSING-distance: 0.5591146945953369\n",
      "138th: # 4.4. Ablation Study-distance: 0.5591017603874207\n",
      "139th: # A Dataset Analysis-distance: 0.5588918924331665\n",
      "140th: # 4.3. Ablation Study-distance: 0.5588594079017639\n",
      "141th: # 4 RESULTS-distance: 0.5588161945343018\n",
      "142th: # C.2 Effect of Number of Comments-distance: 0.558781087398529\n",
      "143th: # 3 KNOWLEDGE GRAPH A UGMENTED NETWORK 3.1 Problem Formulation-distance: 0.5587316751480103\n",
      "144th: # 4.2 Prediction model-distance: 0.5585036873817444\n",
      "145th: # 5.2 Amortized Correction-distance: 0.5584930777549744\n",
      "146th: # 5.3 Ablation Study-distance: 0.5584207773208618\n",
      "147th: # 5.2 Overall Performance (RQ1)-distance: 0.5583678483963013\n",
      "148th: # 3.1 Theoretical Analysis-distance: 0.5581691265106201\n",
      "149th: # 4.2. Adversarial Examples under Adjusted Radius-distance: 0.5581685304641724\n",
      "150th: # A.7 Results on Sentiment Detection-distance: 0.5581358671188354\n",
      "151th: # 4. Attribute Selection: Demographic and Additional Visual Attributes-distance: 0.5578303337097168\n",
      "152th: # 6.5 In-Depth Analysis on Compositional Generalization-distance: 0.557826578617096\n",
      "153th: # 5 Evaluation of Behaviors in Chat-distance: 0.5575718283653259\n",
      "154th: # I.2. Video QA-distance: 0.5574812293052673\n",
      "155th: # 4.6. Ablation Study and Dicussions-distance: 0.5574363470077515\n",
      "156th: # 5.3 Human Evaluation-distance: 0.5573992133140564\n",
      "157th: # 1 Introduction-distance: 0.5572243332862854\n",
      "158th: # 5.3 Further Analysis-distance: 0.5569035410881042\n",
      "159th: # D.3 Against Defense-distance: 0.5567246079444885\n",
      "160th: # 5.4 Ablation Study-distance: 0.5567235946655273\n",
      "161th: # 7. Conclusions and Discussion-distance: 0.5567226409912109\n",
      "162th: # 4.5 A UTOMATIC EVALUATION RESULTS-distance: 0.5567009449005127\n",
      "163th: # 4.3. Ablation study-distance: 0.5566431879997253\n",
      "164th: # 6. Ablation Studies-distance: 0.5566425919532776\n",
      "165th: # 6.2 Dataset Analysis-distance: 0.5564456582069397\n",
      "166th: # 5.2 Ablation Study-distance: 0.5562916994094849\n",
      "167th: # 5.1.2 Generation of Pseudo Labels-distance: 0.5562899112701416\n",
      "168th: # D.3 Alternate Binning Approach-distance: 0.5561825037002563\n",
      "169th: # 4.2 Baselines-distance: 0.556158185005188\n",
      "170th: # 5.4 Ablation Study-distance: 0.5560402274131775\n",
      "171th: # B.1 More Ablation Study-distance: 0.5560339093208313\n",
      "172th: # 4.2 Multi-Aspect Control-distance: 0.5559698343276978\n",
      "173th: # I.1.2 A MAZON POLARITY-distance: 0.5558242797851562\n",
      "174th: # 5.3 Ablation-distance: 0.5556153655052185\n",
      "175th: # 4.2. Ablation and Analysis-distance: 0.5554158091545105\n",
      "176th: # 2 Tasks and L ABEL DESC Datasets-distance: 0.5553958415985107\n",
      "177th: # 5.1. Quantitative Analysis-distance: 0.5552257299423218\n",
      "178th: # 4.2. Qualitative Evaluation-distance: 0.5550980567932129\n",
      "179th: # 4.5 Ablation studies-distance: 0.5550135970115662\n",
      "180th: # BMore Results of APIS-distance: 0.554951012134552\n",
      "181th: # 4.5 Performance on Attribute, Counting, Spatial Reasoning-distance: 0.5547825694084167\n",
      "182th: # 4.3 Model Analysis-distance: 0.5547219514846802\n",
      "183th: # 5.5. Ablation Studies-distance: 0.5544120073318481\n",
      "184th: # 6 Analysis-distance: 0.5539968013763428\n",
      "185th: # B. Ablation Studies-distance: 0.5539737939834595\n",
      "186th: # 7 Ethical Considerations-distance: 0.5539427399635315\n",
      "187th: # 4.5 Ablation Study-distance: 0.5536848902702332\n",
      "188th: # FBASELINESDETAILS-distance: 0.5536405444145203\n",
      "189th: # 4 EXPERIMENT RESULTS-distance: 0.5535594820976257\n",
      "190th: # 2 Related Work-distance: 0.5535460114479065\n",
      "191th: # C. Supplemental Ablation Studies-distance: 0.5534821152687073\n",
      "192th: # 4.4. User Study-distance: 0.5534313917160034\n",
      "193th: # 4.3 Ablation Study-distance: 0.553402304649353\n",
      "194th: # 4.3. Discussions-distance: 0.5533590316772461\n",
      "195th: # 4.4.2 Features and Entity Enhancement-distance: 0.5533378720283508\n",
      "196th: # Acknowledgments and Disclosure of Funding-distance: 0.5533021688461304\n",
      "197th: # A.4. Complete Quantitative Results of P ART -STAD-distance: 0.5531349182128906\n",
      "198th: # 2.4 Present Study-distance: 0.5530542731285095\n",
      "199th: # 4.5. Ablation Study-distance: 0.5530385375022888\n",
      "200th: # C. Results C.1. Perceptual evaluation-distance: 0.5529500842094421\n",
      "201th: # 4.3. Ablation Studies-distance: 0.5528857707977295\n",
      "202th: # 5.2 ZERO -SHOT SENTIMENT STEERING-distance: 0.5528814196586609\n",
      "203th: # 5.3 Ablation Study-distance: 0.5528432726860046\n",
      "204th: # 4.3 Ablation Studies-distance: 0.5527654886245728\n",
      "205th: # 4.4. Ablation Studies-distance: 0.5527089834213257\n",
      "206th: # 5.2.3 Ablation Study-distance: 0.5523868203163147\n",
      "207th: # 5.3 Interactive Human Evaluation-distance: 0.5520772933959961\n",
      "208th: # 2 Related Work-distance: 0.5520504713058472\n",
      "209th: # 4.3 Ablation Study-distance: 0.5520248413085938\n",
      "210th: # 4.3. Ablations and Other Analysis-distance: 0.5520199537277222\n",
      "211th: # 5.1 Implications-distance: 0.5520198941230774\n",
      "212th: # 5.2. Ablation Study-distance: 0.5519700646400452\n",
      "213th: # 4.4. Ablation Study-distance: 0.5518446564674377\n",
      "214th: # 4.3. Conversation Edge Attributes Classification-distance: 0.5518368482589722\n",
      "215th: # 6CONCLUDING DISCUSSION-distance: 0.5518338680267334\n",
      "216th: # 5.2. Part Pose Estimation-distance: 0.5517809987068176\n",
      "217th: # 4.2.2 Performance analysis-distance: 0.5517762303352356\n",
      "218th: # 5.6 Accuracy of Prediction-distance: 0.5516858696937561\n",
      "219th: # 4.5 Ablation on Foveal Attention-distance: 0.5516818761825562\n",
      "220th: # 4.3 Ablation study and model analysis-distance: 0.5516716837882996\n",
      "221th: # Experiments-distance: 0.5516040325164795\n",
      "222th: # 3.3 BERT Encoder-distance: 0.5515838861465454\n",
      "223th: # Table 2: Ablation study of proposed VCCSM (LSTM & BERT Sentence Masking $^+$ Contextual $^+$ Consistency) on testing accuracy, area over the perturbation curve (AOPC), and post-hoc accuracy on RRP dataset.-distance: 0.5513363480567932\n",
      "224th: # 4.4. Ablation Studies-distance: 0.5511334538459778\n",
      "225th: # FExperiment Details 32-distance: 0.5510805249214172\n",
      "226th: # 5.6. Impact of sample size and attribute count on proposed metric-distance: 0.5509780645370483\n",
      "227th: # Algorithm 1 UCB-ADherence (UCB-AD)-distance: 0.5508626103401184\n",
      "228th: # 5.6 Parameters Analysis and Ablation Study-distance: 0.550822377204895\n",
      "229th: # Human Subjective Survey-distance: 0.5505739450454712\n",
      "230th: # 2. Related Work-distance: 0.5504893660545349\n",
      "231th: # 2 Related Work-distance: 0.5502209067344666\n",
      "232th: # 1. Introduction-distance: 0.5500688552856445\n",
      "233th: # 5.3 Performance on Adversarial Detection-distance: 0.5500633716583252\n",
      "234th: # 2. Related Work-distance: 0.5500583648681641\n",
      "235th: # A.1 Annotation Interfaces-distance: 0.5500481724739075\n",
      "236th: # 4.3. Ablation Study-distance: 0.5500434637069702\n",
      "237th: # 6.3 A BLATION STUDIES-distance: 0.5500409007072449\n",
      "238th: # I EMPIRICAL RESULTS REGARDING THEOREM 1-distance: 0.550035834312439\n",
      "239th: # 3.2 HUMAN VALIDATION-distance: 0.5499938726425171\n",
      "240th: # 3.2 Dataset Analysis-distance: 0.5498479604721069\n",
      "241th: # 3.2 Emotion-oriented Analysis.-distance: 0.5498266220092773\n",
      "242th: # 6 Experiment-distance: 0.5497085452079773\n",
      "243th: # 4.3 Impact on Search Experience-distance: 0.5495909452438354\n",
      "244th: # 7. Results-distance: 0.5494133830070496\n",
      "245th: # 6.3. Ablation Studies-distance: 0.5492150187492371\n",
      "246th: # 4.6 Analysis and Discussion-distance: 0.5491927862167358\n",
      "247th: # Topic-Guided Sampling For Data-Efficient Multi-Domain Stance Detection-distance: 0.5491893291473389\n",
      "248th: # 3.1 Language-oriented Analysis-distance: 0.5490984320640564\n",
      "249th: # 4.3. Quantitative Evaluation-distance: 0.5489609837532043\n",
      "250th: # 4.1. Annotation and image synthesis analysis-distance: 0.5488433837890625\n",
      "251th: # 6.2. Annotations and Recognition Accuracy-distance: 0.5488088130950928\n",
      "252th: # 4.5. Ablation Studies-distance: 0.548688530921936\n",
      "253th: # Adaptive Self-Consistency-distance: 0.548678457736969\n",
      "254th: # 5.3. Ablation Study-distance: 0.5486700534820557\n",
      "255th: # C Additional aggregation algorithms 21-distance: 0.5486648082733154\n",
      "256th: # 5.2. Component analysis-distance: 0.5486403107643127\n",
      "257th: # 7.5. Comparison with methods from the 2D domain-distance: 0.5486120581626892\n",
      "258th: # 4.1.1 Ablations-distance: 0.548603355884552\n",
      "259th: # 4.2. Ablation Study-distance: 0.5484575629234314\n",
      "260th: # 5.3. Ablation Studies-distance: 0.5483930110931396\n",
      "261th: # A.1. Human Parsing-distance: 0.5483905076980591\n",
      "262th: # 4.4. Ablation Studies-distance: 0.5483826994895935\n",
      "263th: # 4.4 Human Evaluation-distance: 0.5483774542808533\n",
      "264th: # B. Template Designs-distance: 0.5483641624450684\n",
      "265th: # 4.5. Detailed analysis-distance: 0.5483068823814392\n",
      "266th: # A.3 Ablation of Subject-Awareness-distance: 0.5481994152069092\n",
      "267th: # 4.1 Ablation Study-distance: 0.5481811761856079\n",
      "268th: # 5. Logically Consistent Prediction-distance: 0.548136293888092\n",
      "269th: # 1 Introduction-distance: 0.5480615496635437\n",
      "270th: # 4.5. Ablations-distance: 0.5480381846427917\n",
      "271th: # 4.3. Ablation Studies-distance: 0.54803466796875\n",
      "272th: # 4.6. Ablation study on the number of cameras-distance: 0.5480343699455261\n",
      "273th: # 6. Conclusion-distance: 0.5479907989501953\n",
      "274th: # 4.2. Ablation Study-distance: 0.5479469895362854\n",
      "275th: # 2 Related Work-distance: 0.5479323863983154\n",
      "276th: # 4.7 Impact of Sentence Length-distance: 0.5478566288948059\n",
      "277th: # E.5 A BLATION STUDY ON ONLINE REWARD FOR BASELINES .-distance: 0.5478304624557495\n",
      "278th: # 6. Acknowledgements-distance: 0.547807514667511\n",
      "279th: # 4.3.2 Comparison with Variants of Supervised Method-distance: 0.5476624965667725\n",
      "280th: # 5.1 Experimental Setup-distance: 0.547584056854248\n",
      "281th: # Acknowledgments-distance: 0.5474627017974854\n",
      "282th: # 5.2 Results-distance: 0.5474469661712646\n",
      "283th: # 4. Related Work-distance: 0.5474202632904053\n",
      "284th: # 3 Method 3-distance: 0.547360360622406\n",
      "285th: # 5.4 Ablations and Analysis-distance: 0.5473582744598389\n",
      "286th: # 4.7 Ablations on Fine-grained Contrastive Loss-distance: 0.5473532676696777\n",
      "287th: # 4.6 Attribution Visualization and Analysis-distance: 0.5473445057868958\n",
      "288th: # 5.3 Ablation Studies-distance: 0.5472144484519958\n",
      "289th: # 3.3 Pairwise Human Evaluation-distance: 0.5470980405807495\n",
      "290th: # 5.6 Content appeal enhancer baseline comparison-distance: 0.5469897985458374\n",
      "291th: # 4.4 Ablation Study-distance: 0.5469825267791748\n",
      "292th: # EAdditional results-distance: 0.5469145774841309\n",
      "293th: # 4.3 Ablation Studies-distance: 0.546890139579773\n",
      "294th: # 5.6 Qualitative Results-distance: 0.5468514561653137\n",
      "295th: # BGradient-based Analysis-distance: 0.5468084216117859\n",
      "296th: # 4.3 Comparing Active Learning strategies-distance: 0.5467460751533508\n",
      "297th: # 5.4 Ablation Study-distance: 0.5467126965522766\n",
      "298th: # CAdditional Discussions-distance: 0.5467098951339722\n",
      "299th: # 7 Conclusions-distance: 0.5466195940971375\n"
     ]
    }
   ],
   "source": [
    "for i in range(len(result)):\n",
    "    print(f\"{i}th: {result[i]['entity']['chunk_text'].split('\\n')[0]}-distance: {result[i]['distance']}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-09T02:00:32.299118Z",
     "start_time": "2025-01-09T02:00:29.180411Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "9\n"
     ]
    }
   ],
   "source": [
    "result = query_by_chunk_contain(chunk='Aspect Based Sentiment Analysis', top_k=30)\n",
    "print(len(result))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-09T02:40:06.219839Z",
     "start_time": "2025-01-09T02:40:06.215085Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'id': 454845717673959450,\n",
       " 'paper_id': '6556d23d939a5f4082dbc78f',\n",
       " 'paper_title': 'A Self-enhancement Multitask Framework for Unsupervised Aspect Category Detection',\n",
       " 'chunk_id': 1,\n",
       " 'chunk_text': '# 1 Introduction\\nAspect Category Detection (ACD), Aspect Term Extraction (ATE), and Aspect Term Polarity (ATP) are three challenging sub-tasks of Aspect Based Sentiment Analysis, which aim to identify the aspect categories discussed (e.i., FOOD), all aspect terms presented (e.i., ‘fish’, ‘rolls’), and determine the polarity of each aspect term (e.i., ‘positive’, ‘negative’) in each sentence, respectively. To better understand these problems consider the example in Table 1 .  \\n\\nUnsupervised ACD has mainly been tackled by clustering sentences and manually mapping these clusters to corresponding golden aspects using top representative words ( He et al. ,2017 ;Luo et al. ,2019 ;Tulkens and van Cranenburgh ,2020 ;Shi et al. ,2021 ). However, this approach faces a major problem with the mapping process, requiring manual labeling and a strategy to resolve aspect label inconsistency within the same cluster. Recent works have introduced using seed words to tackle this problem ( Karamanolakis et al. ,2019 ;Nguyen et al. ,2021 ;Huang et al. ,2020 ). These works direct their attention to learning the embedding space for sentences and seed words to establish similarities between sentences and aspects. As such, aspect representations are limited by a fixed small number of the initial seed words. ( Li et al. ,2022 ) is one of the few works that aims to expand the seed word set from the vocabulary of a pretrained model. However, there is overlap among the additional seed words across different aspects, resulting in reduced discriminability between the aspects. Another challenge for the unsupervised ACD task is the presence of noise, which comes from out-of-domain sentences and incorrect pseudo labels. This occurs because data is often collected from various sources, and the pseudo labels are usually generated using a greedy strategy. Current methods handle noisy sentences by either treating them as having a GENERAL aspect ( He et al. ,2017 ;Shi et al. ,2021 ) or forcing them to have one of the desired aspects ( Tulkens and van Cranenburgh ,2020 ;Huang et al. ,2020 ;Nguyen et al. ,2021 ). To address incorrect pseudo labels, ( Huang et al. ,2020 ;Nguyen et al. ,2021 ) attempt to assign lower weights to uncertain pseudo-labels. However, these approaches might still hinder the performance of the model as models are learned from a large amount of noisy data.  \\n\\nIn this paper, we propose A Self-enhancement Multitask (ASeM) framework to address these limitations. Firstly, to enhance the understanding of aspects and reduce reliance on the quality of initial seed words, we propose a Seedword Enhancement Component (SEC) to construct a high-quality set Table 1: Aspect Category Detection, Aspect Term Extraction, and Aspect Term Polarity example.  \\n\\nof seed words from the initial set. The main idea is to add keywords that have limited connections with the initial seed words. Connections are determined by the similarity between the keyword’s context (the sentence containing the keywords) and the initial seed words. In this way, our task is simply to find sentences with low similarity to the initial seed words and then extract their important keywords to add to the seed word set. Since pseudo-label generation relies on the similarity between sentences and seed words, we expect that the enhanced seed word set would provide sufficient knowledge for our framework to generate highly confident pseudo-labels for as many sentences as possible. Secondly, to reduce noise in the training data, instead of treating them as having a GENERAL aspect ( He et al. ,2017 ;Shi et al. ,2021 ) or forcing them to have one of the desired aspects (Tulkens and van Cranenburgh ,2020 ;Huang et al. ,2020 ;Nguyen et al. ,2021 ), we propose to leverage a retrieval-based data augmentation technique to automatically search for high-quality data from a large training dataset. To achieve this, we leverage a paraphrastic encoder (e.g. Arora et al. (2017 ); Ethayarajh (2018 ); Du et al. (2021 )), trained to output similar representations for sentences with similar meanings, to generate sentence representations that are independent of the target task (ACD). Then, we generate task embeddings by passing the prior knowledge (e.g., seed words) about the target task to the encoder. These embeddings are used as a query to retrieve similar sentences from the large training dataset (data bank). In this way, our framework aims to effectively extract domain-specific sentences that can be well understood based on seed words, regardless of the quality of the training dataset.  \\n\\nAnother contribution to our work concerns the recommendation of multi-tasking learning for unsupervised ACD, ATE and ATP in a neural network. Intuitively, employing multi-task learning enables ACD to leverage the benefits of ATE and ATP. Referring to the example in Figure 1 , ATE extracts Opinion Target Expressions (OTEs): ‘fish’ and ‘rolls’ , which requires understanding the emotion ‘positive’ (expressed as ‘unquestionably fresh’ )for ‘fish’ and ‘negative’ (expressed as ‘inexplicably bland’ ) for ‘rolls’ . Meanwhile, ACD wants to detect the aspect category of the sentence, requiring awareness of the presence of \"fish\" and \"rolls\" (terms related to the aspect of \"food\") within the sentence. Despite the usefulness of these relationships for ACD, the majority of existing works do not utilize this information. ( Huang et al. ,2020 )is one of the few attempts to combine learning the representations of aspect and polarity at the sentence level before passing them through separate classifiers.  \\n\\nOur contributions are summarized as follows:  \\n\\n• We propose a simple approach to enhance aspect comprehension and reduce the reliance on the quality of initial seed words. Our framework achieves this by expanding the seed word set with keywords, based on their uncertain connection with the initial seed words. •A retrieval-based data augmentation technique is proposed to tackle training data noise. In this way, only data that connect well with the prior knowledge (e.g., seed words) about the target task is used during training. As a result, the model automatically filters out-of-domain data and low-quality pseudo labels. •We propose to leverage a multi-task learning approach for unsupervised ACD, ATE, and ATP, aiming to improve ACD through the utilization of additional guidance from other tasks during the shared representation learning. •Comprehensive experiments are conducted to demonstrate that our framework outperforms previous methods on standard datasets.',\n",
       " 'original_filename': 'Conf_Paper_Meta_Data_EMNLP_2023_with_whole_text.db'}"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-14T03:13:53.143278Z",
     "start_time": "2025-01-14T03:13:52.696187Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\n"
     ]
    }
   ],
   "source": [
    "result = query_by_title_contain(title='Teaching Large Language Models to Translate with Comparison', top_k=30)\n",
    "print(len(result))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-14T03:13:55.942340Z",
     "start_time": "2025-01-14T03:13:55.935319Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'id': 454846711197427218,\n",
       " 'paper_id': '64acd41c3fda6d7f06b36599',\n",
       " 'paper_title': 'Teaching Large Language Models to Translate with Comparison',\n",
       " 'chunk_id': 0,\n",
       " 'chunk_text': '# TIM: Teaching Large Language Models to Translate with Comparison\\nJiali Zeng 1 , Fandong Meng 1 , Yongjing $\\\\mathbf{Yin^{2,1}}$ , Jie Zhou 1 1 Pattern Recognition Center, WeChat AI, Tencent Inc 2 Westlake University {lemonzeng,fandongmeng,withtomzhou}@tencent.com\\n\\n# Abstract\\nOpen-sourced large language models (LLMs) have demonstrated remarkable efficacy in various tasks with instruction tuning. However, these models can sometimes struggle with tasks that require more specialized knowledge such as translation. One possible reason for such deficiency is that instruction tuning aims to generate fluent and coherent text that continues from a given instruction without being constrained by any task-specific requirements. Moreover, it can be more challenging for tuning smaller LLMs with lower-quality training data. To address this issue, we propose a novel framework using examples in comparison to teach LLMs to learn translation. Our approach involves presenting the model with examples of correct and incorrect translations and using a preference loss to guide the model’s learning. We evaluate our method on WMT2022 test sets and show that it outperforms existing methods. Our findings offer a new perspective on finetuning LLMs for translation tasks and provide a promising solution for generating high-quality translations. Please refer to Github for more details: https://github.com/lemon0830/TIM.\\n\\n# 1 Introduction\\nGenerative large language models, like GPT4, have shown remarkable performance in various NLP tasks ( Brown et al. ,2020 ;Ouyang et al. ,2022 ). For machine translation, the GPT models achieve very competitive translation quality, especially for highresource languages ( Hendy et al. ,2023 ;Zhu et al. ,2023 ), which opens up new possibilities for building more effective translation systems. It is impractical to deploy such large models for the translation task only, and using or tuning open-sourced generative language models has become an attractive research direction. In this regard, researchers have explored strategies for example selection and instruction design through In-Context Learning (ICL) ( Lin et al. ,2022 ;Agrawal et al. ,2022 ). However, evaluations of open-sourced LLMs like Bloom show that they do not perform as well as strong multilingual supervised baselines in most translation directions (Zhu et al. ,2023 ). Additionally, ICL can increase decoding latency due to the need for large models with long context. Based on these observations, researchers suggest tuning relatively small LLMs for translation with a few high-quality supervised instructions ( Zhu et al. ,2023 ;Hendy et al. ,2023 ;Jiao et al. ,2023 ).  \\n\\nInstruction tuning has been shown to be an efficient method for making LLMs better aligned to the task descriptions preferred by humans ( Stiennon et al. ,2020 ;Ouyang et al. ,2022 ;Chung et al. ,2022 ;Wang et al. ,2023 ). The only requirement is to collect task-specific data, and LLMs will be finetuned on the data with the language modeling loss. However, optimizing for simple next-token prediction loss will cause models to overlook context information, especially for low-capacity models. It is serious for the tasks in which the specialized knowledge in context is necessary for task completion, and ignoring such knowledge on translation can lead to inadequacy and hallucination. Therefore, there is a need to investigate the limitations of LLMs and explore methods for improving their performance in specialized tasks.  \\n\\nIn this paper, we propose to teach the language models to learn translation with examples in comparison, aiming to make full use of a small amount of high-quality translation data. Based on the training data, we further construct two kinds of comparisons: output comparison and preference comparison. Output comparison is used to learn responses of different instructions for the same input. Preference comparison is used to maximize the gap between correct and incorrect translations. Specifically, in order to help identify specific areas where the model may be making errors, we introduce an additional preference loss during fine-tuning, which is used to learn reward models ( Stiennon et al. ,2020 ), as regularization to penalize unexpected outputs.  \\n\\nWe evaluate TIM on WMT22 test sets in four language directions ( $\\\\operatorname{EN}{\\\\leftrightarrow}\\\\operatorname{DE}$ ,$\\\\mathrm{EN}{\\\\leftrightarrow}\\\\mathrm{ZH}$ ), and the improvement over the baselines shows the effectiveness of our method. Our model shows better zero-shot translation performance and stability in prompt choice. As the size increases, the performance of the models trained with TIM increases, with the improvement being more pronounced in the case of smaller models. In particular, the tuned LLaMa-13B ( Touvron et al. ,2023 ) achieves top 1 on quality estimation without references in the $\\\\mathrm{EN}{\\\\leftrightarrow}\\\\mathrm{DE}$ , outperforming the dedicated models for quality estimation like COMET.\\n\\n# 2 Related Work\\nThe research of machine translation based on LLMs can be divided into two categories: LLMs as interface ( Lin et al. ,2022 ;Hendy et al. ,2023 ;Zhu et al. ,2023 ;Agrawal et al. ,2022 ) and instruction tuning ( Jiao et al. ,2023 ;Zhang et al. ,2023 ).  \\n\\nThe studies of using LLMs as interface focus on empirical analysis. For example, Hendy et al. (2023 ) evaluate ChatGPT, GPT3.5 (text-davinci003), and text-davinci-002 in eighteen different translation directions involving high and low resource languages. Zhu et al. (2023 ) further evaluate four popular LLMs (XGLM, BLOOMZ, OPT and ChatGPT) on 202 directions and 102 languages, and compare them with strong supervised baselines, which provides a more comprehensive benchmark result. Many efforts are also put into investigating translation exemplars selection strategy of incontext learning ( Lin et al. ,2022 ;Agrawal et al. ,2022 ). Another line of work introduces knowledge, such as word alignments extracted from a dictionary, to LLMs for better translation ( Lu et al. ,2023 ).  \\n\\nTuning smaller LLMs (e.g., 7B) for translation tasks is a promising direction since they are better at English than supervised translation models. However, even for directions from other languages to English, the gap between language models finetuned with translation data and supervised systems is still evident ( Jiao et al. ,2023 ;Zhang et al. ,2023 ). Different from them, we introduce output comparison and preference comparison data and present a preference regularization to alleviate hallucination and help LLMs learn translation better.  \\n\\nStandard Translation   \\nFigure 1: An example of a standard translation prompt.   \\n\\n\\n<html><body><table><tr><td colspan=\"2\">Standardrlamslation Example1</td></tr><tr><td></td><td>Instruction:TranslatefromChinesetoEnglish.</td></tr><tr><td>Input:</td><td>国有企业和优势民营企业走进赣南革命老区。</td></tr><tr><td>Output:</td><td>State-ownedenterprises and advantageousprivate enterprisesentered therevolutionarybaseareaof southJiangxi.</td></tr></table></body></html>',\n",
       " 'original_filename': 'Conf_Paper_Meta_Data_AAAI2024_with_whole_text.db',\n",
       " 'year': 2024}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-09T06:36:40.455030Z",
     "start_time": "2025-01-09T06:36:40.450479Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0th: Leaving the Nest: Going Beyond Local Loss Functions for Predict-Then-Optimize\n",
      "1th: Robust Loss Functions for Training Decision Trees with Noisy Labels\n",
      "2th: Meta-tuning Loss Functions and Data Augmentation for Few-shot Object Detection.\n",
      "3th: AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks.\n"
     ]
    }
   ],
   "source": [
    "for i in range(len(result)):\n",
    "    print(f\"{i}th: {result[i]['paper_title']}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-14T03:14:04.057705Z",
     "start_time": "2025-01-14T03:14:03.773600Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\n"
     ]
    }
   ],
   "source": [
    "result = query_by_paper_id(paper_id='454845653960902774', top_k=10)\n",
    "print(len(result))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'detail': 'Not Found'}\n"
     ]
    }
   ],
   "source": [
    "result = query_keyword_by_title(title='Cross-Domain Data Augmentation with Domain-Adaptive Language Modeling for Aspect-Based Sentiment Analysis.')\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "agent",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
